Entry Name:  "VT-AVIST-MC3"

VAST 2013 Challenge
Mini-Challenge 3: Visual Analytics for Network Situation Awareness

 

 

Team Members:

Peng Mi (mipeng@vt.edu) (Primary Contact)

Yong Cao (yongcao@vt.edu)

Virginia Polytechnic Institute and State University

Student Team:  No

 

Analytic Tools Used:

Animated Visualization Toolkit (AVIST) is a GPU-accelerated visualization tool, which features real-time animated visualization of streaming datasets and multiple coordinated views. The tool is developed at Virginia Tech. AVIST utilizes the parallel computing capacity of GPUs for visualizing and analyzing large datasets. Based on the parallel algorithms of geometry and rendering data generating, AVIST can provide real time visual analytics of millions of data records.

 

AVIST provides four coordinated views: histogram view, parallel coordinate view, dynamic view, and graph view. It also supports three different complex disjunctive data filters: highlight filters, exclusive filters and negative exclusive filters.  The combination of the four coordinated views and three disjunctive-normal-form (DNF) filters can easily help users to identify any patterns and unexpected dynamic events from large datasets. At last, AVIST supports time-synced visualization of multiple datasets, such as the IPS, network flow and computer status datasets in this challenge.

 

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2013 is complete?  Yes

 

 

Video:

http://people.cs.vt.edu/mipeng/vast_2013/vt-avist-mc3.wmv

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC3.1 – Provide a timeline (i.e., events organized in chronological order) of the notable events that occur in Big Marketing’s computer networks for the two weeks of supplied data. Use all data at your disposal to identify up to twelve events and describe them to the extent possible.  Your answer should be no more than 1000 words long and may contain up to twelve images.

 

Week One DataSet

 

Event 1:

Situation: In the network flow dataset, we identify there are records whose “ipLayerProtocalCode” are OTHER, and they indicated the “firstSeenSrcIp” 172.10.0.6 using “firstSeenScrPort” 0 to scan many destination IPs  using “firstSeenDestPort” 0.

Time: during the whole Week-One 07:50, 4/1/2013~ 5:51, 4/7/2013

 

Event 2

Situation: The computer with “firstSeenSrcIp” 172.10.0.6 used its port 1984 (Big Brother Network Monitor ports) to scanned the Big Market Internet. Meanwhile, this computer also used its port 0 to scan the network. In the following picture, the green is about the port 1984, the red is the port 0.

Time: During the whole Week One.

 

Event 3

The computer with a unknown “firstSeenDestIp” 10.3.1.25 periodically scanned the Email Servers of the Big Marketing network. Interestingly, it sometimes scanned the three email servers all together, sometimes it only scanned one or two email servers.

 

Event 4

Situation:  A very interest picture of the Network Health and status data. The information of the service name is not mixed; while they have some patterns. First the information of warning is the major part (the green), then there is no warning for a period. Then lots of errors (the blue) are coming. Other colors represents for different types of service name. The warnings began from 09:05, 4/1,2013 to 06:50, 4/2/2013 which were associated to the “service name” “disk”; and the errors began from 00:42, 4/3/21013 to 18:47, 4/3/2013 and their “service name” were “conn”.

 

Event 5

Situation:  the Web Service of Branch Three (the computer “firstSeenSrcIp” 172.30.0.4) scanned the network of a set of unknown “firstSeenDestIp” from  10.6.XX  to 10.206.XX using port 80. The scanning took place at three different time periods, and all three periods have the same time range of two and half hours.

 

 

Event 6

Situation: After the breakdown of the network, the computer with “firstSeenDestIp” 239.255.255.250 sent lots of UDP packages from its port 1900 (Microsoft SSDP Enables discovery of UPnP devices) to enable the whole network between 3:26, 4/3/2013 and 6:57, 4/3/2013. In the following days, this computer also sent this information three times after the network was broken down.

 

 

Event 7:

Situation: the webmail servers 172.30.0.3, 172.20.0.3 used their 80 port simultaneously scanned the network.  (The graph view is generated by the items of “firstSeenSrcIp” and “firstSeenSrcPort” using force directed algorithm, we see that each source ip has its unique source port, so we assume that it should be the webmail servers scanned the network, otherwise the computers of source ips should have connected by their ports.)

Time: From 3:30, 4/3/2013 ~ 6:50, 4/3/2013

 

Event 8

Situation: The computer with “firstSeenSrcIp” 10.9.81.5 is very suspicious. During the time between 09:30, 4/3/2013 and 11:25, 4/3/2013 it was scanned by web server of Enterprise Site 3(computer of “firstSeenDestIp” 172.30.0.4), then between 9:27, 4/6/2013 and 3:19, 4/7/2013 it scanned computers with “firstSeenDestIp” 172.10.0.4, 172.10.0.5, 172.10.0.9, 172.20.0.6 .

 

Event 9

The computer with “firstSeenSrcIp” 172.20.0.15 scanned the computers of “fistSeenDestIp” 10.6.6.7, 10.12.15.152, 10.15.7.05, 10.70.68.127, 10.250.178.101 from its port 80 between 9:30, 4/3/2013 and 07:06, 7/6/2013.

 

Week Two

 

Event 10

In the Network Health and Status Data of Week Two, the network servers of DC, Email, Web and DNS reported problems all the times, and their “numProcs”, “localAveragePercent” and “physicalMEmoryUsage” are all empty.

 

Event 11

Situation: The servers of the DC, Email, Web and DNS were periodically accessed using the port 3389 from the computers whose IPs are 10.6.6.7, 10.12.15.152, 10.13.XX.XX . These computers wanted to remotely control these servers. The red color shows the inbound of the accessing, and the green color’s direction is empty.

 

 

Event 12

The major part of the IPS Dataset of Week Two is the warnings, which are also periodically appeared. These warnings are the records for TCP protocol. The following table describes this event.

Color

Source IP

Source Port

Destination IP

Red

10.12.15.152

37551 37552

10.0.2.2~10.0.2.8

Green

10.6.6.7

46396 46397

10.0.4.2

Blue

10.17.15.10

40598 40599

10.0.2.2

Purple

10.12.14.15

51447 51448

10.0.3.2~10.0.3.5

Cyan

10.13.77.49

61699 61700

10.0.2.2

 

 

 

MC3.2 – Speculate on one or more narratives that describe the events on the network. Provide a list of analytic hypotheses and/or unanswered questions about the notable events. In other words, if you were to hand off your timeline to an analyst who will conduct further investigation, what confirmations and/or answers would you like to see in their report back to you? Your answer should be no more than 300 words long and may contain up to three additional images.

 

For the Week One Dataset, our first hypothesis that the webmail servers are infected  in the beginning or had been already infected. The following image shows that before the network had been down, the webmail servers were very active.

 

 

The second hypothesis is about the Router, unusual events happened during 05:35, 4/3/2013 ~ 05:36, 4/3/2013. In this minute, there are 4435845 records.  So we use the graph view to see the connections between the source IPs and Destination IPs.  In this image, we see 172.10.0.6 is a hub of lots of other nodes, and 172.0.0.1 is also another hub. 

 

 

The third hypothesis is about the relationship between week One and week Two. In week two the network flow dataset is very small compared with week one. So our hypothesis is that the network problems in week one were still in week Two and got even worse. The following image is the synchronization of the network flow dataset and IPS Dataset. The data from the two datasets has the similar patterns, while the IPS data has more information than network flow data. So we can infer the reason is the logging servers of the network are damaged.

 

MC3.3 – Describe the role that your visual analytics played in enabling discovery of the notable events in MC3.1. Describe whether your visual analytics play a role in formulating the questions in MC3.2. Your answer should be no more than 300 words long and may contain up to three additional images.

 

 

AVIST is a very effective and efficient tool. After users load the data, they can use the filters to remove the common connections such as the accessing of 80, 21 ports, and play the animation. Normally the users can use histogram view and control panel to find the IPs or Ports for filtering. The following image shows how to remove port 80 from the visualized data views.

Uses can find some common scanning from the parallel coordinate view, when playing animation. The following image show the animations when we remove the UDP and other protocols and port 80.  Then users can use the highlight filter to see the distribution of the records in the all dataset when combing parallel coordinate view and dynamic view. In the following image we can easily focus on source ip 172.10.0.6, and we can also try different colors for the comparison and pattern finding.

 

Users can make hypothesis or verify their previous findings using AVIST. In event 2, we find the IP 172.10.0.6 use the Big Brother Network Monitor ports. It is true that 172.10.0.6 only has connections with these two ports, what about other ports?

We use AVIST to selected the internet recording with 172.10.0.6 and without port 1984 and 0. AVIST tells us that 172.10.0.6 still has connections with other ports.